ii nun m h 



United States Patent [19] 

Eleftheriadis et aL 



US006092107A 
[ii] Patent Number: 
[45] Date of Patent: 



ii 



6,092,107 
*JuI. 18, 2000 



[54] SYSTEM AND METHOD FOR INTERFACING 
MPEG-CODED AUDIOVISUAL OBJECTS 
PERMITTING ADAPTIVE CONTROL 

[75] Inventors: Alexandras Eleftheriadis; Yihao Fang; 

Hari Kalva, all of New York; Atul 
Purl, Riverdale, all of N.Y.; Robert 
Lewis Schmidt, Howell, N J. 

[73] Assignees: AT&T Corp; Columbia University, 
both of New York, N.Y. 

[ * ] Notice: This patent issued on a continued pros- 
ecution application filed under 37 CFR 
1.53(d), and is subject to the twenty year 
patent term provisions of 35 U.S.C. 
154(a)(2). 



A. Eleftheriadis et al., "Stored File Format for Object-based 
Audio Visual Representation", pp. 1-8. 

A. Basso el al., "Improved Data Access and Streaming 
Modes for the MPEG-4 File Format", pp. 1-12. 

J. Heid, "Watch This: Streaming Video on Your Web Site", 
create WEB, Apr. 1998, pp. 109-112. 

A. Griffin, "Video on the Net", Popular Mechanics, Mar. 
1998, pp. 51-53. 

L. Chiariglione, "MPEG and Multimedia Communications", 
IEEE Transactions on Circuits and Systems for Video Tech- 
nology, vol. 7, No. 1, Feb. 1997, pp. 5-18. 

(List continued on next page.) 



[21] Appl. No.: 09/055,934 
[22] Filed: Apr. 7, 1998 



[60] 

[51] 
[52] 
[58] 



[56] 



Related U.S. Application Data 

Provisional application No. 60/042,798, Apr. 7, 1997. 

Int. CI. 7 H04N 7/10; H04H 1/02 

U.S. CI 709/217; 348/10; 455/6.2 

Field of Search 709/217-219; 

345/327; 348/6, 7, 10, 12, 13, 725-728; 

455/3.1-6.3; 370/536, 542 

References Cited 

U.S. PATENT DOCUMENTS 



5,563,648 10/1996 

5,596,565 1/1997 

5,754,242 5/1998 

5,794,250 8/1998 



Menand et al 348/13 

Yonemitsu et al 369/275.3 

Ohkami 348/441 

Carino , Jr. et al 707/104 



OTHER PUBLICATIONS 

J. Laier et al., "Content-Based Multimedia Data Access in 
Internet Video Communication", First International Work- 
shop on Wireless ImagefVideo Communications, Sep. 1996, 
pp. 126-133. 



Primary Examiner— John W. Miller 

[57] ABSTRACT 

The invention provides a system and method allowing the 
adaptation of a nonadaptive system for playing/browsing 
coded audiovisual objects, such as the parametric system of 
MPEG-4. The system of the invention is referred to as the 
programmatic system, and incorporates adaptivity on top of 
the parametric system. The parametric system of MPEG-4 
consists of a Systems Demultiplex (Demux) overseen by 
digital media integration framework (DMIF), scene graph 
and media decoders, buffers, compositer and Tenderer. Adap- 
tations possible with the invention include interfaces in the 
categories of media decoding, user functionalities and 
authoring, thus allowing a number of enhanced functional- 
ities in response to use input as well as graceful degradation 
in response to limited system resources. The invention 
includes a specification of an interfacing method in the form 
of an application programming interface (API). Hot object, 
directional, trick mode, transparency and other interfaces are 
specified. 
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SYSTEM AND METHOD FOR INTERFACING With the advent of inexpensive boards/PCMCIA cards 

MPEG-CODED AUDIOVISUAL OBJECTS and with availability of Central Processing Units (CPUs), 

PERMITTING ADAPTIVE CONTROL the MPEG-1 standard is becoming commonly available for 

playback of movies and games on PCs. The MPEG-2 

CROSS-REreREr^TO RELATED 5 standard on me other hand? since it addresses relatively 

higher quality applications, is becoming common for enter- 

This application is related to U.S. Pro visional Application tainment applications via digital satellite TV, digital cable 

Ser. No. 60/042,798 from which priority is claimed. and Digital Versatile Disk (DVD). Besides the applications 

BACKGROUND OF THE INVENTION w ^ ??T* n ° ted ' *f ^ E °' 2 * 

be utilized in various other configurations, in streams com- 

1. Field of Invention municated over network and streams stored over hard disks/ 

The invention relates to the field of coded multimedia and CDs, as well as in the combination of networked and local 

its storage and delivery to users, and more particularly to access. 

such coding when either the channel and decoding resources 15 The success of MPEG-1 and MPEG-2, the bandwidth 

may be limited and time varying, or user applications require limitation of Internet and mobile channels, the flexibility of 

advanced interaction with coded multimedia objects. web-based data access using browsers, and the increasing 

2 Description of Related Art " need for interactive personal communication has opened up 

w . ,. „. , A tj- new paradigms for multimedia usage and control. In 

Digital multimedia oners advantages including TC T~ + ^ j 1 . ^ j 

. . . , . . . , 20 response, ISO-MPEG started work on a new standard, 

manipulation, multigeneration processing, error robustness U?EG ^ ^ MPEG-4 standard has addressed coding of 

and others, but incurs constraints due to the storage capacity informatioa m me form of individual objects 

or transmission bandwidth required, and thus frequently and a system for composition and synchronized playback of 

requires compression or coding for practical applications. these ob j ects vmc the MPEG -4 development of such a 

Further, in the wake of rapid increases in demand for digital 25 g xed parametric system continues, in the meantime, new 

multimedia over the Internet and other networks, the need paradigms in communication, software and networking such 

for efficient storage, networked access, search and retrieval, a s that offered by the Java language have offered new 

a number of coding schemes, storage formats, retrieval opportunities for flexibility, adaptivity and user interaction, 

techniques and transmission protocols have evolved. For For instance, the advent of the Java language offers 

instance, for image and graphics files, GIF, TIF and other networking and platform independence critical to download - 

formats have been used. Similarly, audio files have been ing and executing of applets (java classes) on a client PC 

coded and stored in RealAudio, WAV, MIDI and other from a web server which hosts the web pages visited by the 

formats. Animations and video files have often been stored user. Depending on the design of the applet, either a single 

using GIF89a, Cinepak, Indeo and others. 35 access to the data stored on the server may be needed and all 

To play back the plethora of existing formats, decoders the necessary data may be stored on the client PC, or several 

and interpreters are often needed and may offer various partial accesses (to reduce storage space and time needed for 

degrees of speed and quality performance depending on startup) may be needed. The latter scenario is referred to as 

whether these decoders and interpreters are implemented in streamed playback. 

hardware or in software, and particularly in the case of As noted, when coded multimedia is used for Internet and 

software, on the capabilities of the host computer. If such local networked applications on a computer like a PC, a 

content is embedded in web pages accessed via a computer number of situations may arise. First, the bandwidth for 

(e.g. a PC), the web browser needs to be set up correctly for networked access of multimedia may be either limited or 

all the anticipated content and recognize each type of 45 time-varying, necessitating transmission of the most signifi- 

content and support a mechanism of content handlers cant information only and followed by other information as 

(software plugins or hardware) to deal with such content. more bandwidth becomes available. 

The need for interoperability, guaranteed quality and Second, regardless of the bandwidth available, the client 

performance and economies of scale in chip design, as well side PC on which decoding may have to take place may be 

as the cost involved in content generation for a multiplicity limited in CPU and/or memory resources, and furthermore, 

of formats has lead to advances in standardization in the these resources may be time-varying. Third, a multimedia 

areas of multimedia coding, packetization and robust deliv- user (consumer) may require highly interactive nonlinear 

ery. In particular, ISO MPEG (International Standards Orga- browsing and playback; this is not unusual, since a lot of 

nization Motion Picture Experts Group) has standardized 55 textual content on web pages is capable of being browsed 

bitstream syntax and decoding semantics for coded multi- using hyperlinked features and the same paradigm is 

media in the form of two standards referred to as MPEG-1 expected for presentations employing coded audio-visual 

and MPEG-2. MPEG-1 was primarily intended for use on objects. The parametric MPEG-4 system may only be able 

digital storage media (DSM) such as compact disks (CDs), to deal with the aforementioned situations in a very limited 

whereas MPEG-2 was primarily intended for use in a 60 way, such as by dropping objects or temporal occurrences of 

broadcast environment (transport stream), although it also objects it is incapable of decoding or presenting, resulting in 

supports an MPEG-1 like mechanism for use on DSM choppy audio-visual presentations. Further, MPEG-4 may 

(program stream). MPEG-2 also included additional features not offer any sophisticated control by the user of those kinds 

such as DSM Command and Control for basic user interac- 65 of situations. To get around such limitations of the paramet- 

tion as may be needed for standardized playback of MPEG- ric system, one potential option for MPEG-4 development is 

2, either standalone or networked. in a programmatic system. 
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The use of application programming interfaces (APIs) has 
been long recognized in the software industry as a means to 
achieve standardized operations and functions over a num- 
ber of different types of computer platforms. Typically, 
although operations can be standardized via definition of the 5 
API, the performance of these operations may still differ on 
various platforms as specific vendors with interest in a 
specific platform may provide implementations optimized 
for that platform. In the field of graphics, Virtual Reality 1( j 
Modeling Language (VRML) allows a means of specifying 
spatial and temporal relationships between objects and 
description of a scene by use of a scene graph approach. 
MPEG-4 has used a binary representation (BIFS) of the 
constructs central to VRML and extended VRML in many 35 
ways to handle real-time audio/video data and facial/body 
animation. To enhance features of VRML and to allow 
programmatic control, DimensionX has released a set of 
APIs known as Liquid Reality. Recently, Sun Microsystems 20 
has announced an early version of Java3D, an API specifi- 
cation which among other things supports representation of 
synthetic audiovisual objects as scene graph. Sun Microsys- 
tems has also released Java Media Framework Player API, 
a framework for multimedia playback. However, none of the 25 
currently available API packages offer a comprehensive and 
robust feature set tailed to the demands of MPEG-4 coding 
and other advanced multimedia content. 

SUMMARY OF THE INVENTION 30 

The invention provides a system and method for inter- 
facing coded auidovisual objects, allowing a nonadaptive 
client system, such as the parametric MPEG-4 system, to 
play and browse coded audiovisual objects in adaptive 35 
fashion. The system and method of the invention is pro- 
grammatic at an architectural level, and adds a layer of 
adaptivity on top of the parametric system by virtue of a 
defined set of application programming interfaces specifi- 
cally configured to access and process MPEG-4 coded data. 40 

MPEG-4, familiar to persons skilled in the art, can be 
considered a parametric system consisting of a Systems 
Demultiplex (Demux) overseen by digital media integration 
framework (DMIF), scene graph and media decoders, 45 
buffers, compositor and Tenderer. Enhancements or exten- 
sions offered by the system and method of the invention to 
standard MPEG-4 include a set of defined APIs in the 
categories of media decoding, user functionalities and 
authoring which client applications can invoke. By provid- 50 
ing this powerful audiovisual interface facility, the invention 
allows a number of enhanced realtime and other functions in 
response to user input, as well as graceful degradation in the 
face of limited system resources available to MPEG-4 5S 
clients. 

The invention is motivated in part by the desirability of 
standardized interfaces for MPEG-4 playback and browsing 
under user control, as well as effective response to time- 
varying local and networked resources. Interfaces specified 60 
in the invention are intended to facilitate adaptation of coded 
media data to immediately available terminal resources. The 
specified interfaces also facilitate interactivity expected to 
be sought by users, either directly as a functionality or 65 
indirectly embedded in audiovisual applications and ser- 
vices expected to be important in the future. 



107 

4 

The invention specifies an interfacing method in the form 
of a robust application programming interface (API) speci- 
fication including several categories. In the category of 
media decoding, a visual decoding interface is specified. In 
the category of user functionality, progressive, hot object, 
directional, trick mode and transparency interfaces are 
specified. In the category of user authoring, a stream editing 
interface is specified. The overall set of interfaces, although 
not an exhaustive set, facilitates a substantial degree of 
adaptivity. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention will be described with reference to the 
accompanying drawings, in which like elements are desig- 
nated by like numbers and in which: 

FIG. 1 illustrates a high level block diagram of the system 
illustrating an embodiment of the invention; 

FIG. 2 illustrates a block diagram of the system with 
illustrating details of the embodiment of the invention; 

FIG. 3 illustrates an interface method for visual decoding 
according to the invention; 

FIG. 4 illustrates an interface method for functionalities 
according to the invention; and 

FIG. 5 illustrates an interface method for authoring 
according to the invention. 

DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENTS 

The system of method of the invention will be described 
in the environment of MPEG-4 decoding, in which envi- 
ronment the invention specifies not a single API but a 
collection of APIs that address various interfaces for an 
extended MPEG4 system. The Java language, familiar to 
persons skilled in the art, is used for specification of the 
APIs, and is executed on general or special purpose proces- 
sors with associated electronic memory, storage, buses and 
related components familiar to persons skilled in the art. In 
the invention three categories of API are illustratively 
identified, and representative functions in each category are 
provided. 

The three illustrative API categories are as follows: 

Media Decoding 
User Functionality 
Authoring 

The specific APIs presented by the invention as well as a 
way of organizing the implementations of such APIs are first 
summarized in the following table, and described below. 

TABLE 1 

APIs of invention 

API Category and 
No specifics Explanation 

Media Decoding 

1 . Visual Decoding Decoding of visual objects with 01 without 
scalability from a coded bitstream 
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TABLE 1-continued 



APIs of invention 



API Category and 
No specifics Explanation 



Functionality 

2. Progressive Progressive decoding and composition of an 

3. Hot Object AV object under user control 10 

4. Directional Decoding, enhancement and composition of an 

AV object based on user control 

5. Trick Mode Decoding of AV object with viewpoint (or 

accoustic) directionality selected by used 

6. Transparency Decoding of portions of AV object and 

composition of an AV object under user control ^5 
Decoding, refinement and composition of an 
AV object based on transparency and user control 

Partial Authoring 



7. Stream Editing Editing of MPEG-4 bitstream to modify content 
without decoding and reencoding 



20 



40 



Packages are a means to organize the implementation of 
APIs. Taking into account the library of APIs presented by 
the invention, a partial list of packages follows. 

mpgj.dec 25 

This package contains classes for user functionalities 
including interaction. 

mpgj.fiinc 

This package contains classes for user functionalities 3Q 
including preferences, 
mpgj.util 

This package contains classes that provide interfaces to 
various input, output, sound and video devices. 

The system and method of the invention as well as the 35 
associated interface methods (APIs) will now be described. 

FIG. 1 illustrates a high level block diagram of a system 
implementation of the invention. The implementation con- 
sists of two major components. The first is the known 
parametric system which consists of Digital Media Integra- 
tion Framework (DMIF) 160, providing delivery interface to 
the channel 155, and connecting to the Systems Demux 165, 
the output of which goes through a sequence of blocks, 
represented for simplicity as an aggregated block: BIFS and 45 
Media Decoders, Buffers, Compositor and Renderer 170, the 
output of which on line 175 is presented to Display. The 
second major component consists of an MPEG Application/ 
Applet (MPEG App) 100, which interfaces to the external 
Authoring Unit 130, and User Input respectively via 120, the 
Authoring API and Functionality API of the invention. 
Further, the Java Virtual Machine and Java Media Frame- 
work (JVM and JMF) 110 are used as the underlying basis 
to connect to BIFS and Media Decoders, Buffers, Composi- 55 
tor and Renderer 170, as well as directly interfaces to BIFS 
and Media Decoders, Buffers, Compositor and Renderer 
170, via the Scene Graph API 150 (provided by MPEG and 
used in the invention) and the Decoder API. 

FIG. 2 illustrates in greater detail the various blocks, 60 
components and interfaces of FIG. 1. The Authoring Unit 
130 is shown interfacing on line 200 to MPEG App 100, 
separately from the User Input 140 which interfaces via line 
205, The respective interfaces, Authoring API 290 as well as 65 
Functionality API 295, are also shown. In addition, MPEG 
App 100, and the underlying JVM and JMF 110, are shown 



50 



acting upon BIFS Decoder and Scene Graph 225 via line 
215, as well as interfacing via 207 to Scene Graph API, 210. 
The BIFS Decoder and Scene Graph 225 controls instantia- 
tion of a number of media decoders, 270, 271, 272 via lines 
260, 261, 262, and also controls (via lines 268 and 269) the 
Compositor 282 and the Renderer 284. The JVM and JMF 
110, associated with MPEG App 100, can also control media 
decoders 270, 271 and 272 via respective lines 263, 264, 
265. For FIG. 2, up to now, the various programmatic 
controls and interfaces have been discussed. 

The remaining portion of FIG. 2 provides details of the 
MPEG-4 parametric system, on top of which the operation 
of the programmatic system and method of the invention 
will now be examined. An MPEG-4 system bitstream to be 
decoded arrives via channel 155 to the network/storage 
delivery interface DMIF 160, which passes this over line 
230 to the DeMux 165. The depacketized and separated 
bitstream consists of portions that contain the scene descrip- 
tion information and are sent to BIFS Decoder and Scene 
Graph 225. The bitstream also contains other portions 
intended for each of the various media decoders and pass 
respectively through lines 240, 245 and 250, decoding 
Buffers, 251, 252 and 253, lines 255, 256, and 257, to arrive 
at media decoders 270, 271 and 272 which output the 
decoded media on lines 273, 274 and 275 which form input 
to composition Buffers 276, 277 and 278. The output of 
Buffer 276 on line 279 passes to the compositor along with 
output of Buffer 277 on line 280 and the output of Buffer 278 
on line 281. 

Although only three sets of media decoding operations are 
shown (via decoding Buffers 251, 252, 253, Decoders 270, 
271, 272, and composition Buffers 276, 277, 278), in prac- 
tice the number of media decoders may be as few as one, or 
as many as needed. The Compositor 282 positions the 
decoded media relative to each other based on BIFS Scene 
Graph (and possibly user input) and composes the scene, and 
this information is conveyed via line 283 to the Renderer 
284. Renderer 284 renders the pixels and audio samples and 
sends them via line 175 to a display (with speakers, not 
shown) 285. 

FIG. 3 illustrates the media decoding aspect of the inven- 
tion using visual decoding as the specific example. For 
simplicity, media decoding is referred to simply as decoding. 
Some assumptions are necessary regarding the availability 
of BaseAVObject or VideoDecoder constructs; these 
assumptions are typical of the situation in object oriented 
programming where such abstract classes containing default 
or placeholder operations are often extended by overriding 
its constructs. 

The Decoding API 220 represents an interface, more 
specifically, the Visual Decoding API 301. Using the Visual 
Decoding API 301 it is possible to instantiate a number of 
different Visual Decoders 320. The instantiation can be 
thought of as a control via the block Selective Decode Logic 
(SDL) 306, which is shown to belong along with other 
pieces of logic to BIFS Dec(oder) Logic 305, a portion or 
component of the BIFS Decoder and Scene Graph 225. The 
BIFS Dec Logic 305 exerts control on various visual 
decoders, such as the Base Video Decoder 313, via control 
line 307, the Temporal Enhancement Decoder 314 via 
control line 308, the Spatial Enhancement Decoder 315 via 
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control line 309, the Data Partitioning Decoder 316 via public Vobject trickDecode(Mp4Stream trkstrm, int mode) 

control line 310, the Image Texture Decoder 317 via control Depending on the mode, skip and decode trick stream, 

line 311, the Mesh Geometry/Motion Decoder 318 via line trkstrm> ^ retums a decoded visual object> Vobject 

312 and so forth. The bitstream to be decoded is applied via . .. . . , _, . _ 

line 319 (which corresponds to media decoder input of 255 5 P ubliC MeshObject meshAuxDecode(Mp4Stream auxstrm) 

or 256 or 257 in FIG. 2) to the appropriate decoder and the Decodes an MPEG-4 auxiliary video stream, auxstrm, and 

decoded output is available on line 325 (which corresponds returns a mesh object, MeshObject, which includes mesh 

to media decoder output of 273 or 274 or 275 in FIG. 2). The geometry and motion vectors. 

Base Video Decoder 313 decodes the nonscalable video, the FIG. 4 describes the functionality of certain aspects of the 

Spatial Enhancement Decoder 315 decodes the spatial seal- 30 invention using a number of example functionalities for 

ability video layer/s, the Temporal Enhancement Decoder . u* u • * * a * • * c *i_ * c 

, . , .\ ... . , , , t n which interfaces are denned in terms of another category of 

314 decodes the temporal scalability video layer/s, the Data it ^ T _ ^ . Knr „ nm * 

Partitioning Decoder 316 decodes the data partitioned video ^ ^ Functl °n a % API 295 represents interfaces, more 

layer/s, the Image Texture Decoder 317 decodes the spatial/ 1S specifically, for Tnck Mode (401), Directional (402), Trans- 

SNR scalable layers of still image texture, and the Mesh parency (403), Hot Object (404) and Progressive (405) 

Geometry/Motion Decoder 318 decodes the wireframe mesh functions. Using each of the APIs it is possible to instantiate 

node location and movement of these nodes with the move- a number of different decoders; visual decoders are again 

ment of the object. Such decoders are specified by the used ^ m example. The instantiation can be thought of as 

MPEG-4 visual standard known in the art. Details of this 20 ™ « 1 • 1 ui 1 e 1 *• ™ j 1 ■ / P nr\ 

- A „. « . , a control via several block Selective Decode Logic (SDL), 

category of API presented by the mvention used to access „_ ,., , , , 

the MPEG-4 visual decoders in a flexible and consistent 416 ' 417 ' 418 > 419 ' 420 ' which are shown t0 belon 8 t0 

manner will now be described. APP/BIFS Decoder) Logic 415, component of the BIFS 

Decoding API Decoder and Scene Graph 225 or/and the MPEG-4 App 100. 

Class mpgj.dec.BaseAVObject 25 The APP/BIFS Dec Logic 415 exerts control on various 

public class BaseAVObject visual decoders> such as me Base Vlde0 Deco der 313 via 

This is a basic class allowing decoding of base AV object , 1t . A ^ t „. „_ . _ . _ , 

stream to J control lines 421, 422, 424, 425, the Temporal Enhancement 

Constructors Decoder 315 via control lines 423 and 426, the Spatial 

public BaseAVObject 0 30 Enhancement Decoder 315 via control line 427, the Data 

Methods Partitioning Decoder 316 via control line 429, the Image 

Start^^^ TeXlUrC Decoder 317 via contro1 liQe 430 ' the Mesh 

ur j 7 n /\ Geometry/Motion Decoder 318 via line 428 and so forth, 

public void stopDec Q J 

Stop decoding of data. 35 The bitstream to be decoded is applied via line 431 (which 

public void attachDecoder (Mp4Stream basestrm) corresponds to media decoder input of 255 or 256 or 257 in 

Attach a decoder to basestrm in preparation to decode a valid FIG. 2) to the appropriate decoder and the decoded output is 

MPEG-4 stream to whose decoding is to take place. available on line 445 (which corresponds to media decoder 

Visual Decoding API output of 273 or 274 or 275 in FIG. 2). It is important to 

Class mpgi.dec.Mp4VDecoder 40 f. , r . r 

public class Mp4VDecoder reahze that often a ^ functionality relative to visual 

extends VideoDecoder objects may be realized by use of one or more visual 

This class extends VideoDecoder, an abstract class (not decoders. The SDL is used to not only make a selection 

shown). It contains methods to decode various types of between the specific decoder to be instantiated for decoding 

visual bitstreams. 45 each visual object, but also, the decoder used for a piece of 

C °Kr tr M t0 Ivn a a tne bits tream, and tDe specific times during which it is 

Methods engaged or disengaged. A number of SDL, 416, 417, 418, 

public VObject baseDecode(Mp4Stream basestrm) 419 ' 420 m shown ' one corresponding to each functionality. 

Decodes a base MPEG-4 video stream, basestrm, and 50 Each SDL in this figure has one control input but one of the 

returns a decoded visual object ,VObject. several potential control outputs. Further, for clarification, as 

public Vobject sptEnhDecode(Mp4Stream enhstrm) in the case of FIG. 3, the Base Video Decoder 313 decodes 

Decodes a spatial enhancement MPEG-4 video stream, the nonscalable video, the Spatial Enhancement Decoder 

enhstream and returns a decoded visual object, VObject. 314 decodes ±c ia] scala5ilil video la , the TeQ> 

public Vobject tmpEnhDecode(Mp4Stre am enhstrm) 55 _ 

Decodes a temporal enhancement MPEG-4 video stream, P oral Enhancement Decoder 315 decodes the temporal scal- 

enhstrm, and returns a decoded visual object, VObject. ablUtv Vlde0 laver/s > the Data Partitioning Decoder 316 

public VObject snrEnhDecode(Mp4Stream enhstrm, int decodes the data partitioned video layer/s, the Image Texture 

level) Decoder 317 decodes the spatial/SNR scalable layers of still 

Depending on the level, decodes a snr enhancement 60 image texture, and the Mesh Geometry/Motion Decoder 318 

MPEG-4 video stream, enhstrm and returns a decoded visual decodes the wireframe mesh node location and movement of 

object,VObject. these nodes with the movement of the object; such decoders 

public VObject datapartDecode(Mp4Stream enhstrm, int , . , , , 

l eve jj are again specified by the MPEG-4 visual standard. Details 

Depending on the level, decodes a data partitioned MPEG-4 65 of lhis category of APIs presented by the invention used to 

video stream, enhstrm, and returns a decoded visual object, achieve these functionalities in a flexible and robust manner 

Vobject. will now be described. 
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Functionality API Temporarily suspend decoding of data. 

The following API address the various user interaction public void resumeDec 0 

functionality. Restart decoding of data from current state of pause. 

Progressive API P ublic mt selectHotType 0 

Class mpgj.func.ProgAVObject 5 Select type of enhancement (spatial, quality, temporal etc). 

public class ProgAVObject public Mp4Stream enhanceObject (int type) 

extends BaseAVObject Use selected enhancement type to obtain needed enhance - 

A ProgAVObject allows progressive refinement of quality ment stream, 

of an AV object under user control. Currently, visual objects P ublic void attachDecoder (Mp4Stream srcstrm, int type) 

or „ ftce „_„j u * ctM * , et : U ; m ™ , r\u^* Attach a decoder to srcstrm in preparation to decode a valid 

are assumed to be static (still image vops , a Video Object 10 wnr ^ a * a .t , *j j- • . . i 

„. .... . . . e \ • m . . MPEG-4 stream and specifies the type of decoding is to take 

Plane, which is an mstance m time of an arbitrarily shaped pi aC e 

object; when the shape is rectangular, then a vop is identical puD ii c void offsetstream (Mp4Stream srcstrm, ulong offset) 

to a frame). Allow an offset into the srcstrm as the target where the 

Constructors decoding may start. In reality, the actual target location may 

public ProgAVObjectO 15 be beyond the required target and depends on the location of 

Methods valid entry point in the stream, 

public void standee 0 Directional API 

Start decoding of data. This API allows interaction with directionally sensitive 

public void stopDec 0 AV objects. It supports static visual objects (still vops), 

Stop decoding of data. dynamic visual objects (moving vops), as well as directional 

public void pauseDec 0 speech and audio. For visual objects it permits a viewpoint 

Temporarily suspend decoding of data. to ^ ^fected and only the corresponding bitstreams are 

public void resumeDec 0 decoded and decoded data forwarded to compositor. For 

Restart decoding of data from current state of pause. , . , t . , . j. 

public int selectProgLevel 0 25 ^ ° bjeCtS m analo S ous operation takes place depending 

Select level up to which decoding of transform (DCT or on desired aural P oiDt At P resent ' Predefined directional 

wavelet) coefficients will take place. A level constitutes choices are assumed, 

coefficients up to a certain position in scan order. Class mpgj. tunc. DirecAVObject 

public void attachdecoder (Mp4Stream srcstrm, int proglvl) Public class DirecAVObject 

Attach a decoder to srcstrm in preparation to decode a valid 30 extends BaseAVObject 

MPEG-4 stream and specifies the prog level up to which DirecAVObject is a class that allows creation of objects 

decoding is to take place that respond to x-y-z location in space (in the form of 

public void offisetStream (Mp4Stream srcstrm, ulong offset) prequantized directions). This class is most easily explained 

Allow an offset into the srcstrm as the target where the by assuming a bitstream composed of a number of static 

decoding- may start. In practice, the actual target location 35 visual vops coded as an AV object such that depending on 

may be beyond the required target and depends on the the ^ interact ion, vops corresponding to one or more 

location of valid entry point in the stream. viewpoint are decoded as needed. The class is equally 

Hot Object/Region API suitable to decodin d amic AVOb'ects 

40 Constructors 



public DirecAVObjectO 



This API allows interaction with hot (active) AV objects. 
It may be extended to allow interaction with hot regions 

within an object. This API is intended to allow one or more Methods" 

advanced functionalities such as spatial resolution public void startDec 0 

enhancement, quality enhancement, temporal quality start decoding of data, 

enhancement of an AV object. The actual enhancement that 45 public void stopDec 0 

occurs is dependent on user interaction (via mouse clicks/ Stop decoding of data, 

menu) and the enhancement streams locally/remotely as public void pauseDec 0 

well as enhancement decoders available. Temporarily suspend decoding of data. 

Class mpgj.fiinc.HotAVObject P ublic void resumeDec 0 

public class HotAVObject 50 Restart decoding of data from current state of pause, 

extends BaseAVObject P ublic void loopDec 0 

HotAVObject is a class that triggers the action of enhance- ^ method dlows user interactive decoding of a dynamic 

ment of an AVObject provided that the object is a hot object. visual object as a defined sequence of static vops forming a 

™ ...... 1 . j closed loop. A similar analogy may be applicable to audio as 

Thus hot objects have some enhancement streams associated _ n TT 1 ** ■ !• 1 

. , , . Jt , » r— . , ,55 well. User selection occurs via mouse clicks or menus, 

with them that are triggered when needed. This class extends pu blic int selectDirec 0 

BaseAVObject, which is used primarily to decode base Select the dilcciion (scene orientation). A number of pre- 

(layer) streams. Further, the definition of hot objects may be specified directions are allowed and selection takes place by 

extended to include regions of interest (KeyRegions). clicking a mouse on hot points on the object or via a menu. 

Constructors 60 public Mp4Stream enhanceobject (int orient) 

public HotAVObjectQ Use selected scene orientation to obtain needed temporal 

Methods auxiliary (enhancement) stream. 

public void startDec 0 public void attachDecoder (Mp4Stream srcstrm, int orient) 

Start decoding of data. Attach temporal auxiliary (enhancement) decoder to srcstrm 

public void stopDec 0 65 in preparation to decode a valid MPEG-4 stream and speci- 

Stop decoding of data. fies the selected scene direction of AV object, 

public void pauseDec Q public void offisetStream (Mp4Stream srcstrm, ulong offset) 
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Allow an oflset into the srcstrm as the target where the computing resources, portions of object being hidden and 

decoding may start. In reality, the actual target location may a re thus not needed, or a much higher quality being needed 

be beyond the required target and depends on the location of for a specific re ion at me mX of Q0 { or ^ 

valid entry point in the stream, t , . r . . ..... 

Trick Mode API m 0 regions. The process of using a key color is similar 

Trick Mode API supports conditional decoding under user $ t0 " chroma ke X" technique in broadcast applications, 

control to permit enhanced trick play capabilities. Enhanced Class mpgj.svs.TranspAVObject 

trick play can be conceived as enabling of VCR/CDPlayer P ubUc class TranspAVObject 

like functions such as different speeds for FF or FR, Freeze extends BawAVObjcct 

„ « j . .. . , ' TranspAVObject is a class that can be used to form objects 

brame, Random Access as well others such as reverse play 10 . - ^ , «,.,.. 

, , ... iL , . », T > r ,^, A „ with transparency miormation. Both aural and visual object 

etc, however with the difference that MPEG-4 can allow ^ ^ handled 

these capabilities on individual AV object basis in addition ^pes are an 

to that on composited full scene basis. Constructors 

Class mpgj.tunc.TrickAVObject „ P ubhc TranspAvObject 0 

public class TrickAVObject Methods 

extends BaseAVObject P ublic void startDec 0 

TrickAVObject is a class that can be used to form objects Start decoding of data, 

that allow decoding suitable for trick play. public void stopDec 0 

Constructors 20 Stop decoding of data, 

public TrickAVObjectO public void pauseDec 0 

Methods Temporarily suspend decoding of data, 

public void startDec 0 public void resumeDec 0 

Start decoding of data. Restart decoding of data from current state of pause, 

public void stopDec 0 25 public int getRegion 0 

ubHc vo^LseDec 0 SeleCt the rcsi ° n by number m a Usted menu or b y clickin g 

. P i °n hotpoints (also translates to a number). 

Temporarily suspend decoding of data. . w i ~. - • 

public void resumeDec 0 P M P 4Stream enhanceObject (int type, int regnum) 

Restart decoding of data from current state of pause. 30 Use enhancement type to obtain needed enhance- 

public void loopDec 0 ment Stream for the region regnum - 

This allows user interactive decoding of selected portions of P ublic void attachDecoder (Mp4Stream srcstrm, int type, int 

the srestream for forward or reverse playback at a variety of regnum) 

speeds. Attach decoder to srcstrm in preparation to decode a region 

public boolean selectDirec 0 35 and its ke y color - 

Select the direction of decoding. Returns true when trick public void oflsetstream (Mp4Stream srcstrm, ulong oflset) 

decoding is done in (normal) forward direction, else it Allow an offset into the srcstrm as the target where the 

returns false when reverse direction for trick decoding is decoding may start. In practice, the actual target location 

selected, may be beyond the required target and depends on the 

public Mp4Stream enhanceObject (boolean decdirec) 40 location of valid entry point in the stream. 

Obtain the MPEG-4 stream to be decoded in direction FIG. 5 illustrates the authoring aspects of the invention 

specified by decdirec using an example of stream editing for which an interface is 

PU d l cd' V °) d attachDeCOder ( M P 4Stream srcstrm, int defined in terms of another category of API. The Authoring 

Ai4 e 1 _ 1 /^ C / j j A . j j API 290 represents authoring-related interfaces, more 

Attach tnck decoder to srcstrm in preparation to decode a 45 . ft <T A Jr lT . - *„,.«. 

valid trick mode MPEG-4 stream and specifies the direction W]^y, Stream Editing API (501). Usmg the API it is 

of decoding. possible to edit/modify bitstreams for use by MPEG App 

public void oflfcetStream (Mp4Stream srcstrm, ulong oflset) ( 10 °) or BIFS Decoder and Scene Graph (225). The API 501 

Allow an oflset into the srcstrm as the target where the exerts control on MPEG APP 100 via control line 505, and 

decoding may start. In reality, the actual target location may 50 on BIFS Decoder and Scene Graph 225 via control line 506. 

be beyond the required target and depends on the location of The Stream Editing API thus can help edit/modify an 

valid entry point in the stream. MPEG-4 bitstream containing various audio- visual media 

Transparency API objects as well as the BIFS scene description. Besides 

Transparency API supports selective decoding of regions Stream mxing ^ other ^ allowing authoring are also 

of an object under user control. In the case of visual objects, 55 possible but are not specified by this invention. Details of the 

it is assumed that encoding is done in a manner where a large Stream mxia% ^ m nQW 

object is segmented into a few smaller regions by changing Authoring 

the transparency of other pixels in the object. The pixels not ^ foUowing ^ addresses (he ^ Qf 

belonging to region of interest are coded by assigning them MPEG-4 bitstreams 

a selected key color not present in the region being coded. 60 stream Editing API 

This API allows decoding under user control such that a few Class mp gj. ut ii.StreamEdit 

or all of the regions may be coded. Further, for a region of public class streamEdit 

interest, enhancement bitstream may be requested to ^ class alk)WS determmation of ^ weU as 

improve the spatial or temporal quality. The key color for 65 modification of MPEG-4 systems streams. Operations such 

each region is identified to compositor. The user may not as access, copy, add, replace, delete and others are sup- 

need to decode all regions either due to limited bandwidth/ ported. 
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Constructors 
public StreamEdit 0 
Methods 

public int[] getObjectList(Mp4Stream srcstrm) 
Returns the list of objects in the srcstrm. The returned object 
is the cumulative table of objects in the bitstream. 
public boolean replacelbject (Mp4Stream srcstrm, ulong 

srcobjid, Mp4Stream deststrm, ulong destobjid) 
Replaces the occurrence of objects with object id destobjid 
in deststrm with corresponding occurrences of object with 
object id srcobjid in the srcstrm. The object tables are 
updated accordingly. The operation returns true on success- 
ful replace whereas false indicates a failure to replace. 

public boolean replaceObjectAt (Mp4Stream srcstrm, ulong 
srcobjid, ulong m, Mp4Stream deststrm, ulong destobjid, 
ulong n) 

Same semantics as replaceObject(), except that the position 
to start to replace is specified. Replaces the destination 
object from nth occurrences of destobjid with source objects 
from the mth occurrence of srcobjid. For m=n=0, it performs 
identically to replaceObject(). 

public boolean containsObjectType (MP4Stream srcstrm, 
ulong objtype) 

Returns true if srcstrm contains an object of objtype, else 
returns false, 

public boolean addObjects (Mp4Stream srcstrm, ulong 

srcobjid, Mp4Stream deststrm) 
Adds objects of srcobjid from srcstrm to deststrm. Returns 
true if successful, else returns false, 
public boolean addObjectsAt (Mp4Stream srcstrm, ulong 

srcobjid, Mp4Stream deststrm, ulong destobjid, ulong n) 
Adds objects of srcobjid from srcstrm to deststrm starting 
after nth occurrence of destobjid. Returns true if successful, 
else returns false. 

public boolean copyObjects (Mp4Stream srcstrm, ulong 

srcobjid, Mp4Stream deststream, destobjid) 
Copies objects with srcobjid in srcstrm to deststream with 
new object id, destobjid. If deststrm does not exist, it is 
created. If it exists it is overwritten. This operation can be 
used to create elementary stream objects from multiplexed 
streams for subsequent operations. Returns true if 
successful, else returns false, 

public boolean deleteObjects (Mp4Stream deststrm, ulong 
destobjid) 

Delete all objects with destobjid in deststrm. Also remove all 
composition information. Returns true if successful, else 
returns false. 

public boolean splice At (Mp4Stream deststrm, ulong 

destobjid, ulong n, Mp4Stream srcstrm) 
Splice deststrm after nth occurrence of destobjid and paste 
the srcstrm. Returns true if successful, else returns false. 

Collectively, the flexible system and method including the 
set of library functions reflected in the APIs of FIGS. 1 
through 5 provide a new level of adaptivity allowing match- 
ing of coded media streams to decoding terminal resources, 
such as remote laptop PCs or other devices. In addition, the 
inventions also includes support for user interaction allow- 
ing advanced new functions in conjunction with appropriate 
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decoders as well as selective decoding of coded media 
object bitstreams. 

In the implementation of the invention, categories includ- 
5 ing defined AV-related Functionalities are introduced, and 
API set is established to enable simpler as well as more 
complicated interactions between decoding and composition 
of embedded audiovisual objects, all in a universal and 
10 consistent manner. 

The foregoing description of the system and method of 
the invention is illustrative, and variations in construction 
and implementation will occur to persons skilled in the art. 

15 For instance, while a compact and universal set of input, 
output and mapping functions in three categories have been 
described, functions can be added or subtracted from the API 
set according to changing network, application or other 

20 needs. The scope of the invention is intended to be limited 
only by the following claims. 

What is claimed is: 

1. A system for decoding audiovisual objects coded 
25 according to the MPEG-4 standard, comprising: 

an interface library containing a predetermined set of 
standardized application programming interfaces for 
processing audiovisual objects, each of the standard - 
30 ized programming interfaces having predefined func- 
tion calls; 

a processor, configured to access the interface library, and 
to decode and present audiovisual objects according to 
35 function calls related to at least one of the application 
programming interfaces. 

2. The system of claim 1, wherein the processor unit 
executes a client application invoking the function calls. 

4Q 3. The system of claim 1, further comprising a user input 
unit, the system being responsive to a state of decoding, 
playback or browsing system resources and to user interac- 
tion provided through the user input unit. 

45 4. The system of claim 1, wherein the interface library 
comprises a visual decoding interface to decode visual 
object bitstreams. 

5. The system of claim 1, wherein the interface library 
comprises a functionality interface to provide enhanced user 

50 . 

interaction. 

6. The system of claim 1, wherein the interface library 
comprises an authoring interface providing bitstream editing 
and manipulation capabilities. 

55 7. An operating system using the system of claim 1 to 
provide visual, functionality and authoring interfaces using 
the interface library. 

8. The system of claim 1, further comprising a video 
60 decode and playback unit supporting the interface library. 

9. The system of claim 1, further comprising a multimedia 
browser module employing the interface library for user 
viewing. 

65 10. The system of claim 1, further comprising a multi- 
media plug -in module called from a web browser employing 
the interface library. 
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11. A method for decoding audiovisual objects coded 15. The method of claim 11, wherein the interface library 

according to the MPEG-4 standard, comprising the steps of: comprises a functionality interface to provide enhanced user 

generating an interface library, the interface library com- interaction, 

prising a predetermined set of standarized application 16. The method of claim 11, wherein the interface library 

programming interfaces; comprises an authorizing interface providing bitstream edit- 

accessing the audiovisual objects using variables related ^ and man i pu i a tion capabilities, 

to at least one of the set of interface definitions in the 1? ^ melhod of ^ u &fthcr risi the st 

mterface library; and c ... . t . , _ ° . r 

; . , ,. ... of providing an operating system using visual, functionality 

decoding the audiovisual objects represented by the van- 10 . . • t _r ♦ ■ tU • * ^ ru 

ables iU and authoring interfaces to a user using the interface library. 

^ThT method of claim 11, further comprising the step 18 ™ e method of claim U > farther <»mpifeiiig ^ e P 

of executing a client appUcation, the chent application of decoding and playing back video information using the 

forming an adaptive system controlling an underlying interface library. 

MPEG-4 decoding system. 15 19. The method of claim 11, further comprising the step 

13. The method of claim 11, further comprising the step of providing a multimedia browser employing the interface 
of providing a user input, the interfacing being responsive to library. 

a state of decoding, playback, or browsing system resources 20. The method of claim 11, further comprising the step 

and to user interaction provided through the user input unit. 2Q of providing a multimedia plug-in called from a web 

14. The method of claim 11, wherein the interface library browser employing the interface library, 
comprises a visual decoding interface to decode visual 

object bitstreams. ***** 
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